age = 20
if age >= 18:
print("Adult")Adult
Conditional statements like if, if ... else, and elif are essential in Python to control the analysis pipeline and automating tasks and decisions. The logic closely resembles that in R, but as previously seen Python uses indentation (not curly or round brackets) to define blocks of code
if statementPerforms an action only if a condition is met:
if statementBasic flowchart showing the logic of the if statement
if...else statementSometimes you need to perform alternative, mutually-exclusive actions:
if...else statementSometimes you need to perform alternative, mutually-exclusive actions:
Note that indentation is really important!
if...elif...else statementWhen you need to evaluate more than just two alternative conditions, you can use sort of nested conditional statements with with if...elif...else
Example of automated decision in a hypothetical pre-registered analysis pipeline:
import numpy as np
import scipy.stats as st
x1 = np.random.normal(0, 1, size=30)
x2 = np.random.normal(0.5, 1, size=30)
tt = st.ttest_ind(x1, x2)
print(tt.pvalue.round(4))0.7787
if tt.pvalue < 0.05:
print("Significant result: proceeding with follow-up analysis")
# Here you could perform other analyses after the preliminary check
else:
print("No significant result: reporting preliminary test only")No significant result: reporting preliminary test only
All previous examples evaluated a single statement that may be True or False. However, you often want to apply this operation to an entire vector
agesVector = np.array([2, 28, 15, 11, 4, 67, 0, 42, 14, 8])
if agesVector >= 18:
print("Adult")
else:
print("Minor")ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
the error message suggests that I might use np.any(agesVector >= 18) or np.all(agesVector >= 18), but this is not what I want! What I want is actually an if...else that evaluates across a whole vector of Trues and Falses (which should be like the ifelse() in R)
np.where() and np.select()agesVector = np.array([2, 28, 15, 11, 4, 67, 0, 42, 14, 8])
np.where(agesVector >= 18, "Adult", "Minor")array(['Minor', 'Adult', 'Minor', 'Minor', 'Minor', 'Adult', 'Minor',
'Adult', 'Minor', 'Minor'], dtype='<U5')
manages one single condition, similar to ifelse() in R
conditions = [agesVector >= 18, agesVector >= 13, agesVector >= 2]
choices = ["Adult", "Adolescent", "Child"]
np.select(conditions, choices, default="Infant")array(['Child', 'Adult', 'Adolescent', 'Child', 'Child', 'Adult',
'Infant', 'Adult', 'Adolescent', 'Child'], dtype='<U10')
manages multiple nested conditions; no direct equivalent in R, maybe dplyr::case_when()
Looping in Python is used to repeat actions. for and while are most common
Repeat a data simulation to estimate the standard error of the mean:
import numpy as np
N = 30
niter = 10
np.random.seed(0) # set seed for reproducibility: best practice!
results = np.empty(niter) # initialize empty vector: best practice!
for i in range(niter):
x = np.random.normal(size=N)
results[i] = x.mean()
print(results.round(4))[ 0.4429 -0.2895 -0.1337 0.5108 0.0965 -0.0672 -0.1006 -0.0776 -0.304
0.1978]
0.267
Iterating over a sequence of integers (e.g., “i in range(niter)” is a common practice, however you could also iterate directly over the elements of a List or other data structures
THISTHIS
ISIS
AA
VECTORVECTOR
OFOF
STRINGSSTRINGS
List comprehension is another, compact type of for loop over list elements:
while loopThe while loop is another classical type of iterative structure. It is useful when the precise number of iterations is unknown a priori, and depends on a condition becoming True
amount = 1000
month = 0
interest_rate = 0.001
while amount < 1500:
month += 1
amount += amount * interest_rate
print(month)406
break in loopsThe break command allows to interrupt any loop based on a condition
import time
import scipy.stats as st
i = 0
pval = 1
Start = time.time()
while pval >= 0.001: # go on until p < 0.001
i += 1
x1 = np.random.normal(0,1,size=30)
x2 = np.random.normal(0,1,size=30)
tt = st.ttest_ind(x1, x2)
pval = tt.pvalue
Now = time.time()
if Now - Start > 10:
break # however, stop if overall time exceeds 10 seconds
print([i, pval.round(4)])[1045, np.float64(0.0004)]
for with zip()zip() pairs elements across multiple sequences while iterating them
teacher = ["Pastore", "Granziol", "Feraco","Altoe"]
course = ["CurrentIssues", "BasicsInference", "SEM","Outliers"]
hours = [10, 20, 20, 5]
for t, c, h in zip(teacher, course, hours):
print(f"{t} teaches {c}, which has {h} hours")Pastore teaches CurrentIssues, which has 10 hours
Granziol teaches BasicsInference, which has 20 hours
Feraco teaches SEM, which has 20 hours
Altoe teaches Outliers, which has 5 hours
base = [5, 10, 10, 2, 7, 15]
exponent = [2, 1, 2, 5, 5, 2]
result = [b**e for b, e in zip(base, exponent)] ; print(result)[25, 10, 100, 32, 16807, 225]
numpy vectorized operations: np.array(base) ** np.array(exponent)
map()map() applies a specific function to each item in a sequence:
in map(), you need to use list(...) to actually generate the result, otherwise a non-evaluated “lazy” map object is obtained
zip() and map() are about equivalent to lapply()/sapply() in R
Custom functions are widely used in Python for efficiently reusing chunks of code. Define your functions with def; the logic is very similar as in R:
a = [10, 14, 7.6, 18, 22, 50, 0.5]
b = [700, 131, 215, 133.2, 190, 4100, 108.9]
c = [-4.2, -10.2, 2, -15]
def zScore(vect):
vect = np.array(vect)
mu = np.mean(vect)
sigma = np.std(vect)
return (vect - mu) / sigma
zScore(a).round(3)array([-0.503, -0.233, -0.665, 0.038, 0.308, 2.201, -1.145])
array([-0.071, -0.489, -0.427, -0.487, -0.446, 2.425, -0.505])
array([ 0.415, -0.525, 1.386, -1.277])
defLet’s elaborate the custom zScore function a little bit, adding another arguments that allows us to specify whether we want to ignore missing values:
myVector = np.array([10, 14, 7.6, np.nan, 18, 22, 50, 0.5, np.nan, 1.4, 7])
def zScore(vect, naIgnore=True):
vect = np.array(vect)
if naIgnore:
vect = vect[~np.isnan(vect)]
mu = np.mean(vect)
sigma = np.std(vect)
return (vect - mu) / sigma
zScore(myVector, naIgnore=False).round(2)array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
array([-0.32, -0.04, -0.49, 0.25, 0.53, 2.5 , -0.98, -0.92, -0.53])
~ operator is equivalent to not, but elementwise
lambdalambda command allows you to define a function in a single line of code without def or return; it may be useful for quick transformation, but of course does not allow any complex “logic” / statement
| Task | Python | R |
|---|---|---|
| basic if | if cond: |
if(cond){ } |
| if … else | if cond:else: |
if(cond){} else { } |
| Multiple conditions | if cond1:elif cond2:else: |
if(cond1){} else if(cond2){} else { } |
| Block delimiter | indentation | { } |
| “not” elementwise | ~cond |
!cond |
| Multiple checks | (a > 1) & (b < 5) |
(a > 1) & (b < 5) |
| Vectorized condition | np.where(conds, ifT, ifF) |
ifelse(conds, ifT, ifF) |
| Multiple/nested vectorized conditions | np.select([...], [...]) |
dplyr::case_when() |
| Task | Python | R |
|---|---|---|
| Loop over integers | for i in range(n): |
for(i in 1:n){ } |
| Loop over elements | for a in A: |
for(a in A){ } |
| While loop | while cond: |
while(cond){ } |
| Block delimiter | indentation | { } |
| Break loop | break |
break |
| Apply function (list) | list(map(func, A)) |
lapply(A, func) |
| Multilist iteration | for a, b in zip(A, B): |
mapply(FUN, A, B) |
| List comprehension | [func(a) for a in A] |
lapply(...) |
| Function | def myFunc(a):_____ ..._____ return ... |
myFunc = function(a){ ... return(...)} |
| Supercompact function | lambda a: a + 1 |
function(a) a + 1 |